Sequence-Structure Patterns: Discovery and Applications
نویسندگان
چکیده
Protein sequence data is being generated at a tremendous rate; however, functional annotation of these proteins is proceeding at a much slower pace. Biologists rely on computational biology and pattern recognition to predict the functionality of proteins. This is based on the fact that proteins that share a similar function often exhibit conserved sequence patterns. Such sequence patterns, or motifs, are derived from multiple sequence alignments and have been collected in databases such as PROSITE, PRINTS, SPAT, and eMOTIF. These patterns help to classify proteins into families where the exact function may or may not be known. Research has shown that these domain signatures often exhibit specific threedimensional structures. In this paper, we show how starting from a seed sequence pattern from any of the existing sequence pattern databases, and using information from the protein structure databases, it is possible to design biologically meaningful sequencestructure patterns (SSPs). An important by-product of our method to generate sequence-structure patterns is an improved sequence alignment as well as an improved structural alignment of proteins belonging to a family and containing that pattern. Validation was performed by matching the resulting SSPs to domains in the ASTRAL compendium associated with a family or super-family designation in the SCOP database. SSPs generated by this method were frequently either fully specific (no false positives), fully sensitive (no false negatives), or both (diagnostic).
منابع مشابه
A Framework for Exploring the Frequent Patterns based on Activities Sequence
In recent years, the development of the use of location-based tools has made it possible to produce geometric trajectories from the user's movement paths. In this way, users' goal of traveling and related activities can be considered in addition to the geometry and route shape. the user activity trajectory represents the sequence of the visited activities and its related analysis as presented i...
متن کاملSingle Nucleotide Polymorphisms and Association Studies: A Few Critical Points
Uncovering DNA sequence variations that correlate with phenotypic changes, e.g., diseases, is the aim of sequence variation studies. Common types sequence variations are Single nucleotide polymorphism (SNP, pronounced snip).SNPs are the third-generation molecular marker. SNP represents a DNA sequence variant of a single base pair with the minor allele occurring in more than 1% of a given popula...
متن کاملQuery Driven Sequence Pattern Mining
The discovery of frequent patterns present in biological sequences has a large number of applications, ranging from classification, clustering and understanding sequence structure and function. This paper presents an algorithm that discovers frequent sequence patterns (motifs) present in a query sequence in respect to a database of sequences. The query is used to guide the mining process and th...
متن کاملDiscovery of Novel Peptidomimetics for Brain-Derived Neurotrophic Factor using Phage Display Technology
Brain-Derived Neurotrophic Factor (BDNF) is a neuroprotectant candidate for neurodegenerative diseases. However, there are several clinical concerns about its therapeutic applications. In the current study, we selected BDNF-mimicking small peptides from phage-displayed peptide library as alternative molecules to the clinical challenges. The peptide library was screened against BDNF receptor (Ne...
متن کاملMULTIDIMENSIONAL LONGEST COMMON SUBSEQUENCE DISCOVERY From LARGE DATABASE USING DNA OPERATIONS
The problem of analysis of biological sequences, is the discovery of sequence similarity of various kinds, in the primary structure of related proteins and genes. This sequence search can be applied to various applications like discovery of association rules, strong rules, correlations, sequential rules, frequent episodes, multidimensional patterns and many other important discovery tasks. In t...
متن کاملAlgorithms for pattern matching and discovery in RNA secondary structure
Text-indexing structures provide significant advantages in the solution of many problems related to string analysis and comparison, and are nowadays widely used in the analysis of biological sequences. In this paper, we present some applications of affix trees to problems of exact and approximate pattern matching and discovery in RNA sequences. By allowing bidirectional search for symmetric pat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005